Single-nucleus RNA-sequencing in pre-cellularization Drosophila melanogaster embryos

Our current understanding of the regulation of gene expression in the early Drosophila melanogaster embryo comes from observations of a few genes at a time, as with in situ hybridizations, or observation of gene expression levels without regards to patterning, as with RNA-sequencing. Single-nucleus RNA-sequencing however, has the potential to provide new insights into the regulation of gene expression for many genes at once while simultaneously retaining information regarding the position of each nucleus prior to dissociation based on patterned gene expression. In order to establish the use of single-nucleus RNA sequencing in Drosophila embryos prior to cellularization, here we look at gene expression in control and insulator protein, dCTCF, maternal null embryos during zygotic genome activation at nuclear cycle 14. We find that early embryonic nuclei can be grouped into distinct clusters according to gene expression. From both virtual and published in situ hybridizations, we also find that these clusters correspond to spatial regions of the embryo. Lastly, we provide a resource of candidate differentially expressed genes that might show local changes in gene expression between control and maternal dCTCF null nuclei with no detectable differential expression in bulk. These results highlight the potential for single-nucleus RNA-sequencing to reveal new insights into the regulation of gene expression in the early Drosophila melanogaster embryo.


Introduction
Early animal development is largely driven by maternally-deposited RNAs and proteins. In Drosophila melanogaster, zygotic gene expression is detected as early as the 10th nuclear cycle; however, zygotic genome activation primarily occurs during the 14 th nuclear cycle while maternal RNAs are degraded and cellularization begins [1,2]. Much of the difficulty in understanding the regulation of early embryonic gene expression lies in the challenge to simultaneously capture expression level and patterning. Classic examples of patterned gene expression and regulation originate from in situ hybridizations [3][4][5][6], however the nature of in frame. The homologous replacement template plasmid was constructed using a pUC19 backbone and~1 kb homology arms generated by PCR (5' homology arm primers: CCACAAA-GAAACGTTAGCTAGTTCC and TCCTATGGACAAATTGGATTTGTTTTGG, 3' homology arm primers: CCAAGGAGGACAAAAAAGGACGAG and CGTGAGTGGCGCGTGATC). Repair template was coinjected into Cas9-expressing embryos (Rainbow Transgenic Flies, Camarillo, California), along with two guide RNAs (ATTTGTCCATAGGAATGCCA, TGTCCATAGGAATGCCAAGG) expressed from a U6:3 promoter on a modified version of the pCFD3 plasmid [28]). Resulting flies were crossed to flies containing chromosome 3 balancer chromosomes, and screened by genotyping PCR. Putative hits were further screened by PCR and sequencing of the entire locus using primers outside the homology arms (CATTAGAATT-CAAGGGCCATCAG and CACTTGAAGGATGGCTCG). A successful insertion line was recombined with an FRT site on chromosome 3L at cytosite 80B1 (Bloomington stock # BL1997).

Fly husbandry
All stocks were fed standard Bloomington food from LabExpress and maintained at room temperature unless otherwise noted. We used the FLP-DFS (dominant female sterile) technique [29] to generate dCTCF mat-/embryos. First, we crossed virgin hsFLP, w � ;; Gl � /TM3 females to w � ;; ovo D , FRT2A(mw)/TM3 males (Bloomington Drosophila Stock Center ID: 2139). From this cross, we selected hsFLP,w � ;; ovo D , FRT2A(mw)/TM3 males and crossed them to virgin CTCF � ,FRT2A/TM3 females. Larvae from this cross were heat-shocked on days 4, 5, and 6 for at least two hours in a water bath at 37˚C. Upon hatching, virgin hsFLP, w � /+; CTCF � , FRT2A (mw)/ovoD � , FRT2A(mw) females were placed into a small cage with their male siblings. Flies were fed every day with yeast paste (dry yeast pellets and water) spread onto apple juice agar plates. These crosses were conducted simultaneously with another insulator protein, and control embryos were collected from the ovo D line used to generate those germline clones (Bloomington Drosophila stock Center ID: 2149).

Western blots
Flies laid on grape-agar plates for two hours and embryos were either aged two hours at room temperature or taken directly after collection. Embryos were dechorionated with bleach, rinsed, and frozen in aliquots of~25 embryos at -80 C. Embryos were homogenized in 25 μl RIPA buffer (Sigma cat # R0278) supplemented with 1 mM DTT and protease inhibitors (Sigma cat # 4693116001) using a plastic pestle. After homogenization, samples were mixed with 25 μl 2x Laemmli buffer (Bio-Rad # 1610737EDU), boiled for 3 minutes, and spun at 21,000 x g for 1 minute. Samples were loaded onto Bio-Rad mini Protean TGX 4-20% gels (# 4561096) and run at 200V for 30 minutes. Protein was transferred at 350 mA for one hour to Immobilon PVDF membrane (Millipore-Sigma # IPVH00010). Blots were blocked for one hour in PBST (1x PBS with 0.1% Tween) with 5% nonfat milk, and stained with primary antibodies (courtesy of Maria Cristina Gambetta [27], 1:1000 in PBST with 3% BSA) for one hour. Blots were then washed 3 times for 3 minutes rotating in PBST and probed with an HRP-conjugated anti-Rabbit secondary antibody (Rockland Trueblot, # 18-8816-33, 1:1000 in PBST with 5% milk) for one hour. After extensive washing with PBST, blots were developed with Clarity ECL reagents (Bio-Rad # 1705060) and imaged. Validation of the loss of maternal dCTCF is shown in S1 Fig. remove embryos retained by the mothers overnight, followed by a 2 hour collection and 2 hour aging. Then, the embryos were dechorionated in 100% bleach for 1 minute, or until most of the embryos were floating, with regular agitation by a paintbrush. The embryos were transferred to a collection basket made of a 50 mL conical and mesh. After the embryos were rinsed with water, the embryos were transferred into an eppendorf tube containing 0.5% PBS-Tween. From this point forward, samples were kept on ice to prevent further aging of embryos. A minimum of 9 early to mid-nuclear cycle 14 embryos were sorted using an inverted compound light microscope and transferred to a 2 mL dounce containing 600 uL of lysis buffer (10 mM 10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 1% Bovine Serum Albumin, 1% RNase Inhibitor (Enzymatics, Part Num. Y9240L)) + 0.1% IGEPAL. The embryos were homogenized 20 times with a loose pestle and 10 times with a tight pestle. Pestles were rinsed with 100 uL lysis buffer + 0.1% IGEPAL after use. The resulting 800 uL of buffer and nuclei were transferred into an eppendorf tube, filtered with a 40 uM filter. Nuclei were pelleted by spinning for 5 minutes at 900 g and 4˚C, washed in 500 uL lysis buffer (without 0.1% IGEPAL), and pelleted again before resuspending the nuclei in 20 uL lysis buffer (without 0.1% IGEPAL). Nuclei concentration was then adjusted to 1000 uL nuclei per uL, then nuclei were barcoded with the 10X Chromium Single Cell 3' Gene Expression Kit (v3). Control and dCTCF mat-/nuclei were processed on separate days, then sequenced together with the Illumina NovaSeq (SP flow cell).

Data processing and analysis
We used kallisto-bustools [33] to generate a custom reference index and generate a nucleus x gene matrix. The data were analyzed in both Python and R, using primarily scVI via scvi-tools [34,35], scanpy [36], and custom scripts for analysis.
Control and dCTCF mat-/nuclei were filtered separately as follows: (1) nuclei were ranked by the number of UMIs detected and nuclei ranked below the expected number of nuclei (10,000) were removed; (2) nuclei with fewer than 200 expressed genes were removed; (3) nuclei with greater than 5% mitochondrial expression were removed; (4) nuclei with greater than 50,000 UMI counts were removed; (5) genes expressed in fewer than 3 nuclei were removed.
Prior to batch correction, the data were subset to the 6000 most highly variable genes using scanpy's dCTCF mat-/based on log1p normalized expression. We ran scVI with gene_likelihood = 'nb' to correct for batch effects.
The nuclei were clustered using the Leiden algorithm [37] within scanpy and visualized on a 2D UMAP [38]. Prior to batch correction, nuclei were clustered on log1p normalized gene expression. After batch correction, nuclei were clustered on the latent space derived from the scVI model. Marker genes representing each cluster were found using the sc.tl.rank_gen-es_groups function from scanpy with the Wilcoxon signed-rank test. In situ hybridizations of representative marker genes were obtained from the Berkeley Drosophila Genome Project [39][40][41]. Colors representing Leiden clusters were projected onto a virtual embryo using novoS-paRc [42,43].
Log2 fold change and associated p-values were obtained for each gene using diffxpy (https://diffxpy.readthedocs.io/). Statistically significant differential expression was determined following Bonferroni correction of the p-values and filtered for adjusted p-value less than 0.05 and absolute value of log2 fold change greater than or equal to 1.5. Intersecting sets of differentially expressed genes were found and visualized with an UpSet plot [44,45], following correction of adjusted p-values for the number of comparisons (multiplied by 11; 10 for the total number of clusters + 1 to include bulk differential expression).

Data and code availability
Raw sequencing data and.h5ad files are available on DataDryad: https://doi.org/10.6078/ D13D9R. Much of our analysis originated from work by Booeshaghi and Pachter (2020) [46] and Chari et al (2021) [47], with the addition of custom scripts.
All of the code used in the analysis and in generating the figures is available here: https:// github.com/aralbright/2021_AAMSME. Single-nucleus data pre-processing, batch correction and clustering, virtual in situ hybridization, and differential expression analyses are available in this GitHub repository as Google Colab notebooks. These notebooks are available for anyone to run from a web browser with the option to enter any genes of interest not discussed in this manuscript.

Results
To establish the use of single-nucleus RNA-sequencing for examining gene expression in the early Drosophila embryo prior to cellularization, we hand-sorted 10 to 20 early to mid-nuclear cycle 14 control and dCTCF mat-/embryos and isolated nuclei for single-nucleus RNAsequencing using 10x Genomics 3' Gene Expression. After filtering the data for high quality nuclei and correcting for non-biological variability (S2-S4 Figs), we used Leiden clustering [37] to detect distinct groups of nuclei from control and dCTCF mat-/embryos, altogether resulting in 8,400 nuclei across 10 clusters (Fig 1A) composed of both control and dCTCF mat-/- nuclei (Fig 1B). We also removed yolk and pole cell nuclei as these nuclei are not informative for patterned gene expression; however, the fact that subsets of nuclei clustered on marker gene expression for yolk or pole cell nuclei provided us with confidence that our data accurately represent single-nucleus expression (S5 Fig). After removing groups of nuclei as indicated, the nuclei no longer cluster according to expression of yolk or pole cell markers, indicating that our data are of high quality (S6 Fig). Once we finalized the dataset, we then asked whether gene expression in the clusters determined by the Leiden algorithm are truly distinct.
Given how well characterized patterned gene expression is in the early Drosophila embryo and that we found several distinct clusters of nuclei, we suspected that the clusters may represent different spatial regions within the embryo. Expression of the top marker genes representing each cluster is certainly distinct, and we noticed that many of these genes are expressed in patterns, namely fkh, tll, and htl (Fig 1C). To determine if single-nucleus RNA-sequencing in the early embryo can be spatially resolved, we examined in situ hybridizations of top marker genes for each cluster. We found that representative virtual and published in situ hybridizations of the top 20 marker genes (S1 Table) correspond to specific spatial regions within the embryo for clusters 0-7 (Fig 2A-2H and 2A'-2H'). We also found by projecting our nuclei onto a virtual embryo that the identities we assigned to each cluster correspond to the spatial identities of these clusters (Fig 2I).
The anterior, posterior, and ventral clusters are the most defined based on a projection of our nuclei onto a virtual embryo, while clusters that represent the middle of the embryo in general had less well defined borders (Fig 2I). Even so, our data shows that single-nucleus RNA-sequencing in the early Drosophila embryo yields information related to the spatial position of nuclei prior to dissociation. We should note however, that the virtual in situ hybridizations shown above, as well as additional virtual in situ hybridizations that represent each cluster, contain genes present in the list of reference genes used to generate the virtual patterns (see Ilp4, htl, fkh in Fig 2, and Antp, NetA, disco in S7 Fig). As such, we considered that the virtual in situ hybridizations may be biased in those cases; however, we believe the presence of several other genes representing each cluster with similar patterning validated with both virtual and published in situ hybridizations indicates that reference bias is not an issue. In the end, we were unable to determine a spatial identity for clusters 8 and 9; however, we decided to include these clusters in subsequent analyses because the nuclei passed our quality control filters. Interestingly, cluster 9 appears to be absent in dCTCF mat-/embryos (Fig 1A and 1B). Without knowing the identity of cluster 9, we can only speculate why this may be the case; however, this raises the possibility that dCTCF may play a role in nuclear fate.
In an effort to establish the use of single-nucleus RNA-sequencing to detect local changes in gene expression in embryos prior to cellularization, we then asked whether we could detect potential differential expression of genes in individual clusters, but not in bulk. In most clusters and in bulk, gene expression appears to be up-regulated upon loss of maternal dCTCF (S8 Fig). We also found that differentially expressed genes shared between all clusters and in bulk represent one of the largest shared sets (Fig 3A, left most black bar). However, a substantial number of candidate differentially expressed genes appear differentially expressed in single clusters (Fig 3A, blue bars). Many other genes appear differentially expressed in groups of clusters, but not in bulk. Because we found many candidate differentially expressed genes, we considered that this may be due to low expression given the sparsity of single-nucleus RNAsequencing; however, we found that the mean expression of candidate differentially expressed genes in single or multiple clusters overall does not have a substantially different pattern from that of non-differentially expressed genes (S9 Fig). Each of these curves are right-skewed, or most genes are expressed in low levels at less than 100 transcripts per million (TPM).
Altogether, these results show that single-nucleus RNA-sequencing in the early embryo can be used to detect candidate differentially expressed genes that would not appear in bulk-sequencing data.
Upon the loss of an early developmental factor like dCTCF, we expect to observe potential differential expression of patterned genes in specific clusters with single-nucleus RNAsequencing. Interestingly, stumps, a ventrally-expressed gene is differentially expressed in one ventral cluster, but not the other (Fig 3B). bowl, a gap gene primarily expressed in the anterior, appears to be up-regulated in the posterior-biased and one of the ventral clusters ( Fig 3C). Finally, Esp, a posterior-striped gene is differentially expressed in several clusters. Intriguingly, we did not detect differential expression in bulk for stumps, bowl, or Esp. Because dCTCF is required for proper expression of Abd-B [26], a Drosophila Hox gene, we also examined expression of several Drosophila Hox genes within each cluster and in bulk. Upon the loss of maternal dCTCF, Antp and abd-A are potentially differentially expressed in certain clusters, with potential differential expression of Antp in bulk data (S10 Fig). However despite the  Fig 1A. Virtual in situ hybridizations and projection of clusters onto a virtual embryo were generated using novoSpaRc [42,43]. https://doi.org/10.1371/journal.pone.0270471.g002

PLOS ONE
Single-nucleus RNA-sequencing in pre-cellularization Drosophila melanogaster embryos requirement of dCTCF for proper expression of Abd-B shown by in situ hybridization [26], we found no evidence of differential expression of Abd-B, in agreement with bulk RNA sequencing in larval CNS dCTCF mutants [27].
We cannot be certain whether or not specific genes are truly differentially expressed spatially without further investigation; however, our results demonstrate the use of single-nucleus RNA-sequencing to detect possible local changes in gene expression upon perturbation in the early embryo prior to cellularization.

Discussion
We conducted the above analyses in order to determine whether we could use single-nucleus RNA-sequencing as a means of understanding the regulation of gene expression in the early Drosophila embryo. First, we show that nuclei can be grouped into clusters represented by distinct gene expression. Then, we show that representative marker genes from the majority of the clusters recapitulate known patterns of expression. Importantly, we also present examples of potential differential expression of patterning genes in individual clusters upon loss of maternal dCTCF, but not in bulk.
Prior to this work, studies towards our understanding of the regulation of patterned gene expression in a spatial context included cytoplasmic RNAs in measures of expression. We must acknowledge the caveat that we do not know the extent to which maternal RNAs enter the nucleus and some of our results may reflect the presence of both maternal and zygotic RNAs. Nonetheless, we believe that single-nucleus RNA-sequencing is better suited as opposed to bulk RNA-sequencing to understand changes in gene expression in pre-cellularization embryos upon the loss of important developmental factors because of the ability to resolve local changes in expression. Supporting this notion, single-cell RNA-sequencing has already shown to resolve the loss of an entire cell fate in cellularized dorsoventral mutant embryos [9].
Whether or not the changes in gene expression that we observed have implications in embryonic development related to the loss of dCTCF is unclear without further investigation, such as single-molecule FISH to validate the observed changes in gene expression of particular RNAs. Ultimately, using single-nucleus RNA-sequencing to examine changes in gene expression upon the loss of important developmental factors has the potential to uncover perturbation responses previously undetected by bulk RNA-sequencing.  Differential expression of genes detected in one or more clusters, but not in bulk (A) UpSet plot for visualizing the top 40 shared sets of candidate differentially expressed genes between control and dCTCF mat-/nuclei within each cluster and in bulk. Horizontal bar plot (A, left) represents the total number of candidate differentially expressed genes within the cluster of the corresponding row. The vertical bar plot (A, top) represents the number of shared candidate differentially expressed genes for the conditions indicated below and is sorted from largest to smallest intersecting set, with each count representing a unique gene. Connected dots (black) represent the corresponding group of genes in the vertical bar plot above that might be differentially expressed in the clusters represented by rows with a filled in circle. Candidate genes differentially expressed in a single cluster are represented in blue. (B-D) log(scvi normalized expression) of (B) stumps, (C) bowl, (D) Esp in each cluster (left) and bulk (right) for control (teal) and dCTCF mat-/nuclei (pink). Asterisks indicate statistically significant differential expression (absolute value of expression > = 1.5 and Bonferroni corrected p-value < 0.05).

S1
https://doi.org/10.1371/journal.pone.0270471.g003 S1 Fig. (A) Western blotting of OreR, 0h and 2h dCTCF mat-/embryos using an antibody to Cp190, another insulator protein, as a control. (B) Western blotting of OreR, 0h and 2h dCTCF mat-/embryos using an antibody to dCTCF. The 2h embryos were aged for an additional 2 hours with the majority of the embryos representing nuclear cycle 14, the same time point at which we conducted single-nucleus RNA-sequencing. A cross-reactive band appears at approximately 75 kd, with the dCTCF band appearing at approximately 130 kd. Abd-B (bottom) in each cluster (left) and in bulk (right) for control (teal) and dCTCF mat-/nuclei (pink). Asterisks indicate statistically significant differential expression (absolute value of expression > = 1.5 and Bonferroni corrected p-value < 0.05). (TIF)